Illuminating the Path
The Research and Development Agenda for Visual Analytics

Executive Summary CH4

Edited by Ben. Turn to ContentPage

Get the knowledge flowing and circulating! :)

目录

Data Representations and Transformations (Chapter 4)

Visualization is intended to represent data and information in a way that can be acted upon by the analyst. The quality of the visualization is the most directly affected by the quality of the data representation that underlies the visualization.

Data must be transformed into a representation that is appropriate to the analytical task and appropriately conveys the important content of a large, complex, and dynamic collection. A data transformation is a computational procedure that converts between data representations. Data transformations are used to convert data into new, semantically meaningful forms. For example, linguistic analysis can be used to assign meaning to the words in a text document. Data transformations may be used to determine the optional way to display data, such as by creating a two-dimensional representation of data with hundreds or thousands of dimensions.

Transforming and representation data are complex for many reasons. The first issue is the sheer number of different types of data that may be analyzed: text in the form of short or long documents comprising many languages, numeric data form sensors, structured data from relational databases, audio and video, and image data. Each of these types of data may need to be transformed in different ways to facilitate visual analysis.

The massive scale and dynamic nature of data dictate that the transformations must be fast, flexible, and capable of operating at many levels of abstraction. Data are of varying levels of certainty and reliability, so these assessments of quality must be preserved and presented. Data of different types are often required to conduct an analysis, so it is very important to develop a data synthesis capability-a capability to bring data of different types together in a single environment so that analysts can concentrate on the meaning of data rather than on the form in which it was originally packaged.

The panel recommends several actions to advance the community’s capabilities for data representation and transformation.

Recommendation

Develop both theory and practice for transforming data into new scalable representations that faithfully represent the content of the underlying data.

From the standpoint of the analyst, border guard, or first responder, information provides guidance, insight, and support for assessments and decision. Our goal is to illuminate the potentially interesting content within the data so that users may discover important and unexpected information buried within massive volumes of data. Each type of data presents its own challenges for data representation and transformation. In most cases, data representations are not meant to replace the original data but to augment them by highlighting relevant nuggets of information to facilitate analysis.

We must develop mathematical transformations and representations that can scale to deal with vast amounts of data in a timely manner. These approaches must provide a high-fidelity representation of the true information content of the underlying data. They must support the need to analyze a problem at varying levels of abstraction and consider the same data from multiple viewpoints.

Data are dynamic and may be found in ever-growing collections or in streams that may never be stored. New representation methods are needed to accommodate the dynamic and sometimes transient nature of data. Transformation methods must include techniques to detect changes, anomalies, and emerging trends.

Methods exist at varying levels of maturity for transforming data. For example, there are a variety of methods for transforming the content of textual documents using either statistical or semantic approaches. Combining the strengths of these two approaches may greatly improve the results of the transformation.

Recommendation

Create methods to synthesize information of different types and from different sources into a unified data representation so that analysts, first responders, and border personnel may focus on the meaning of the data.

Complex analytical tasks require the user to bring together evidence from a variety of data types and sources, including text sources in multiple languages, audios, video, and sensor data. Today’s analytical tools generally require that the user consider data of different types separately. However, users need to be able to understand the meaning of their information and to consider all the evidence together, without being restricted by the type of data that the evidence originally came in. Furthermore, they need to be able to consider their information at different levels of abstraction.

Synthesis is essential to the analysis process. While it is related to the concept of data fusion, it entails much more than placing information of different types on a map display. The analytical insight required to meet homeland security missions requires the integration of relationships, transactions, images, and video at the true meaning level. While spatial elements may be displayed on map, the non-spatial information must be synthesized at the meaning level with that spatial information and presented to the user in a unified representation.

Recommendation

Develop methods and principles for representing data quality, reliability, and certainty measures throughout the data transformation and analysis process.

By nature, data are of varying quality, and most data have levels of uncertainty associated with them. Furthermore, the reliability of data may differ based on a number of factors, including the data source. As data are combined and transformed, the uncertainties may become magnified. These uncertainties may have profound effects on the analytical process and must be portrayed to users to inform their thinking. They will also make their own judgments of data quality, uncertainty, and reliability based upon their expertise. These judgments must be captured and incorporated as well. Furthermore, in this environment of constant change, assessments of data quality or uncertainty may be called into question at any time based on the existence of new and conflicting information.

The complexity of this problem will require algorithmic advances to address the establishment and maintenance of uncertainty measures at varying levels of data abstraction.

 


 

数据表示和转换(第四章)

可视化的目的是以一种可以被分析者采取行动的方式来表示数据和信息。可视化的质量最直接受到作为可视化基础的数据表示的质量的影响。

数据必须被转换为适合分析任务的表示,并适当地传达大型、复杂和动态集合的重要内容。数据转换是一种在数据表现形式之间进行转换的计算程序。数据转换被用来将数据转换成新的、有语义的形式。例如,语言学分析可以用来给文本文件中的单词赋予意义。数据转换可用于确定显示数据的可选方式,例如通过对具有数百或数千维度的数据创建一个二维表示。

由于许多原因,转换和表示数据是复杂的。第一个问题是可能被分析的不同类型的数据数量庞大:由多种语言组成的短或长文件形式的文本、传感器形式的数字数据、来自关系型数据库的结构化数据、音频和视频以及图像数据。这些类型的数据都可能需要以不同的方式进行转换,以促进可视化分析。

数据的大规模和动态性质决定了(dictate)转换必须快速、灵活,并且能够在许多抽象层次上操作。数据的确定性和可靠性水平各不相同,所以这些质量评估必须被保存和呈现。进行分析时往往需要不同类型的数据,因此开发数据综合能力非常重要--将不同类型的数据汇集到一个环境中的能力,这样分析人员就可以集中精力研究数据的意义,而不是数据最初被包装的形式。

该小组建议采取一些行动来提高社区的数据表示和转换能力。

建议

发展理论和实践,将数据转化为新的可扩展的表现形式,忠实地表现基础数据的内容。

从分析家、边防军或第一反应者的角度(standpoint)来看,信息为评估和决策提供了指导、洞察力和支持。我们的目标是照亮数据中潜在的有趣内容,以便用户可以发现埋藏在大量数据中的重要和意外的信息。每种类型的数据都对数据表示和转换提出了自己的挑战。在大多数情况下,数据表示不是为了取代原始数据,而是通过突出相关的信息块(nuggets)来增强它们,以促进分析。

我们必须开发能够及时处理海量数据的数学转换和表示方法。这些方法必须为基础数据的真实信息内容提供高保真的表示(high-fidelity representation)。它们必须支持在不同的抽象层次上分析问题的需要,并从多个角度考虑同一数据。

数据是动态的,可能存在于不断增长的集合或可能永远不会被存储的流中。需要新的表示方法来适应数据的动态和有时是瞬时的性质。转换方法必须包括检测变化、异常和新兴趋势的技术。

转换数据的方法在不同的成熟度上存在。例如,有多种方法可以使用统计学或语义学方法来转换文本文件的内容。结合这两种方法的优势,可能会大大改善转化的结果。

建议

创建方法,将不同类型和不同来源的信息合成为统一的数据表示,以便分析员、第一反应者和边境人员可以专注于数据的意义。

复杂的分析任务要求用户将各种数据类型和来源的证据汇集起来,包括多种语言的文本来源、音频、视频和传感器数据。今天的分析工具一般要求用户分别考虑不同类型的数据。然而,用户需要能够理解其信息的含义,并将所有的证据放在一起考虑,而不被证据最初的数据类型所限制。此外,他们需要能够在不同的抽象层次上考虑他们的信息。

综述对分析过程至关重要。虽然它与数据融合的概念有关,但它所需要的(entails)远远超过将不同类型的信息放在地图上显示。满足国土安全任务所需的分析洞察力需要在真正意义上整合关系、交易、图像和视频。虽然空间元素可以显示在地图上,但非空间信息必须在意义层面上与空间信息进行综合,并以统一的表示方式呈现给用户。

建议

制定方法和原则,在整个数据转换和分析过程中表示数据质量、可靠性和确定性的措施。

从本质上讲,数据的质量是不同的,大多数数据都有与之相关的不确定性水平。此外,数据的可靠性可能因一些因素而不同,包括数据来源。当数据被组合和转换时,不确定性可能会被放大。这些不确定性可能会对分析过程产生深远的影响,必须向用户描述(portrayed),以告知他们的想法。他们也会根据自己的专业知识对数据质量、不确定性和可靠性做出自己的判断。这些判断也必须被捕捉和纳入。此外,在这个不断变化的环境中,对数据质量或不确定性的评估可能会因为新的和相互矛盾的信息的存在而在任何时候被质疑。

这个问题的复杂性需要算法的进步,以解决在不同的数据抽象水平上建立和维护不确定性的措施。